Using chunked corpora for the acquisition of collocations and idiomatic expressions
نویسندگان
چکیده
This paper1 discusses the use of recursive chunking of large German corpora (over 300 million words) for the identification and partial classification of significant lexical cooccurrences of adjectives and verbs. The goal is to provide a fine-grained syntactic classification of the data at the levels of subcategorization and scrambling. We analyze the combinatory preferences of adjectives with verbs in predicative(-like) constructions subcategorizing a finite or infinite clause. Adjective+verb collocations have so far not been analyzed in much detail. The corpus data suggest that here is in many cases a clear correlation not only between collocational preferences, distributional properties, and subcategorization frames, but also between all of them and semantic classes of adjectives. Some adjective+verb combinations seem to be idiomatic, or lexicalized as complex predicates. The availability of a robust broad coverage chunker for German, along with an automatic partial classification into different phenomena is a prerequisite for this kind of detailed analysis of collocations and idiomatic expressions. 1This work has been carried out in the framework of the Transferbereich 32: Automatische Exzerption, a DFG-funded project aiming at the creation of support tools for the corpus-based updating of printed dictionaries in lexicography carried out in cooperation with the publisher Duden BIFAB AG and Langenscheidt KG.
منابع مشابه
(Un)Translatability of Persian Idiomatic Expressions to English in Political Discourse
The present study sought to investigate the extent to which Persian idiomatic expressions would influence the western translators' strategies in providing the ultimate product in English, and it also attempted to uncover the underlying assumptions in target text, then to suggest some weighty strategies to overcome difficulties with translation. For this purpose, the data was analyzed within the...
متن کاملThe Impact of Multimodal Channels on Teaching Idiomatic Expressions to Intermediate EFL Learners with Regard to Their Attitudes
This study was to explore facilitative function of using multimodal channels over single channel presentation and comprehension of idiomatic expressions to Iranian EFL intermediate proficiency learners. Out of a pool of 90, sixty intermediate participants were homogenized by a QPT test, using a quasi-experimental design. They were randomly assigned to three equal groups: WhatsApp-, SMS- and Cla...
متن کاملMining the Web for Idiomatic Expressions Using Metalinguistic Markers
In this paper, methods for identification and delimitation of idiomatic expressions in large Web corpora are presented. The proposed methods are based on the observation that idiomatic expressions are sometimes accompanied by metalinguistic expressions, e.g. the word “proverbial”, the expression “as they say” or quotation marks. Even though the frequency of such idiom-related metalinguistic mar...
متن کاملTextuality of Idiomatic Expressions in Cameroon English
The meaning of an idiomatic expression cannot be transparently worked out from the meanings of its constituent words due to its figurative and unpredictable nature. Consequently, the syntactic composition and the structural paradigm of an idiomatic expression are supposed to be the same in every context. However, this is not the case in the institutionalized second language varieties of English...
متن کاملIdiomatic Object Usage and Support Verbs
Every language contains complex expressions that are language-specific. The general problem when trying to build au tomated translation systems or human-readable dictionaries is to detect expressions that can be used idiomatically and then whether the expressions can be used idiomatically in a particular text, or whether a literal t ranslat ion would be preferred. It follows from the definition...
متن کامل